Parquet File Processing in Talend
Category |
|
Prerequisites |
|
Third-party software |
Hadoop cluster |
Description
|
Talend offers a wide range of data input components to help companies read files in different data formats. In terms of big data, the most common data formats are Parquet, Avro, and ORC. This solution template covers core concepts around compression, partitioning, and when you should use the Parquet file format. It shows you how to convert a Kaggle CSV dataset from AWS S3 into the Parquet file format and perform various analytics on the dataset. The dataset contains loan information from businesses that applied for loans as part of COVID relief. |